What is claimed is: 

1 . A method for balancing the load of a parallel processing system having a plurality of 
parallel processing elements arranged in a loop, wherein each processing element (PE^) has a 
local number of tasks associated therewith, wherein r represents the number for a selected 
processing element, and wherein each of said processing elements is operable to communicate 
with a clockwise adjacent processing element and with an anti-clockwise adjacent processing 
element, the method comprising: 

determining a total number of tasks present within said loop; 
calculating a local mean number of tasks for each of said plurality of processing 
elements; 

calculating a local deviation for each of said plurality of processing elements; 

determining a running partial deviation sum for each of said plurality of 
processing elements; 

determining a clockwise transfer parameter and an anti-clockwise transfer 
parameter for each of said plurality of processing elements; and 

redistributing tasks among said plurality of processing elements in response to 
said clockwise transfer parameter and said anti-clockwise parameter for each of said 
plurality of processing elements. 

2. The method of claim 1 wherein said determining a total number of tasks present 
within said loop, comprises: 

transmitting said local number of tasks associated with each of said plurality of 
processing elements to each other of said plurality of processing elements within said 
loop; 

receiving within each of said plurality of processing elements said number of 
local tasks associated with said each other of said plurality of processing elements; 
and 

summing said number of local tasks associated with each of said plurality of 
processing elements with said number of local tasks associated with each other of 
said plurality of processing elements. 

3. The method of claim 1 wherein said determining said total number of tasks present 
within said loop includes solving the equation V = ^ v,. , where iS/^ represents the number of 

1=0 

said processing elements in said loop and v, represents said local number of tasks associated 
with an i^^ processing element in said loop. 
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4. The method of claim 1 wherein said calculating a local mean number of tasks within 
each of said plurality of processing elements includes solving the equation 

= Trunc{(y + E^)/ N), where Mr is said local mean for said PE^., N is the total number 
of said processing elements in said loop, and Er is a number in the range of 0 to and 
wherein each processing element has a different Er value. 

5. The method of claim 4 wherein Er controls said Trunc function such that said total 
number of tasks for said loop is equal to the sum of the local mean number of tasks for each 
of said plurality of processing elements in said loop. 

6. The method of claim 4 wherein said local mean = Trunc({V + E^)I N) for each 
local FEr within said loop is equal to one of ^and (A"+l). 

7. The method of claim 1 wherein said calculating a local deviation within each of said 
plurality of processing elements comprises finding the difference between said local number 
of tasks for each of said plurality of processing elements and said local mean number for each 
of said plurality of processing elements. 

8. The method of claim 1 wherein said determining a running partial deviation sum for 
each of said plurality of processing elements comprises: 

transmitting said local deviation associated with each of said plurality of 
processing elements to another of said plurality of processing elements within said 
loop; 

receiving within each of said plurality of processing elements said local deviation 
associated with at least one other of said plurality of processing elements; and 

summing said local deviation associated with each of said plurality of processing 
elements with said local deviation associated with at least one other of said plurality 
of processing elements. 

9. The method of claim 1 wherein said determining a running partial deviation sum for 

i=J 

each of said plurality of processing elements comprises solving the equation Sj = ^ A , 

1=0 

where Sj represents said running partial deviation sum, A represents the local deviation 
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associated with the /'^ processing element, and j 1) where ATis the number of 

processing elements on said loop. 

10. The method of claim 1 wherein said determining a clockwise transfer parameter and 
an anti-clockwise transfer parameter within each of said processing elements comprises: 

setting Ta = {Hr'^Lr)-^ 2; and 

setting Tc = D-'Ta where Hr represents a highest extrema of said running partial 
deviation sum; Lr represents a lowest extrema of said running partial deviation sum, 
D represents the local deviation of a selected local processing element; and Tc 
represents said clockwise transfer parameter, and Ta represents said anti-clockwise 
transfer parameter. 

11. A method for reassigning tasks associated with a selected processing element within a 
parallel processing system having a plurality of processing elements connected in a loop, each 
of said plurality of processing elements having a local number of tasks associated therewith, 
the method comprising: 

determining the total number of tasks on said loop; 

computing a local mean value for said selected processing element; 

computing a local deviation for said selected processing element, said local 
deviation representative of the difference between said local number of tasks for said 
selected processing element and said local mean value for said selected processing 
element; 

determining a running partial deviation sum for said selected processing element; 

computing a number of tasks to transfer in a clockwise direction for said selected 
processing element; 

computing a number of tasks to transfer in an anti-clockwise direction for said 
selected processing element; and 

reassigning said tasks associated with said selected processing element relative to 
the said number of tasks to transfer in a clockwise direction and said number of task 
to transfer in an anti-clockwise direction. 



12. The method of claim 1 1 wherein said determining the total number of tasks on said 
loop, comprises: 

receiving within said selected processing element said number of local tasks 
associated with said each other of said plurality of processing elements; and 
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summing said number of local tasks associated with said selected processing 
element and said number of local tasks associated with each other of said plurality of 
processing elements. 

13. The method of claim 1 1 wherein computing a local mean value for a selected 
processing element includes solving the equation = Trunc{(y + E^)/ N) , where Mr 
represents said local mean for a PE^, iV represents the total number of processing elements in 
said loop, and Er is a number in the range of 0 to 

14. The method of claim 13 wherein Er controls said Trunc function such that said total 
number of tasks for said loop is equal to the sum of the local mean number of tasks for each 
of said plurality of processing elements in said loop and wherein each processing element has 
a different Er value assigned. 

15. The method of claim 1 1 wherein said determining a running partial deviation sum for 
said selected processing element comprises: 

receiving within said selected processing element a local deviation associated 
with at least one other of said plurality of processing elements; and 

summing said local deviation associated with said selected processing element 
and said local deviation associated with at least one other of said plurality of 
processing elements. 

16. The method of claim 1 1 wherein said determining a running partial deviation sum for 
said selected processing element comprises solving the equation Sj = , where Sj 

1=0 

represents said running partial deviation sum, A represents the local deviation associated with 
an processing element, and j ^N- 1) where iV is the number of said plurality of processing 
elements on said loop. 

17. The method of claim 1 1 wherein said computing a local mean value, said computing 
a local deviation, said determining a running partial deviation sum, computing a number of 
tasks to transfer in a clockwise direction, computing a number of tasks to transfer in an anti- 
clockwise direction, and said reassigning tasks relative to the said number of task to transfer 
in a clockwise direction and said number of tasks to transfer in an anti-clockwise direction are 
completed simultaneously for each of said plurality of processing elements within said loop. 
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18. The method of claim 1 1 wherein said computing a number of tasks to transfer in a 
clockwise direction for said selected processing element includes evaluating at least one of a 
maximum extrema and a minimum extrema of said running partial deviation sum. 

19. The method of claim 1 1 wherein said computing a number of tasks to transfer in an 
anti-clockwise direction for said selected processing element includes evaluating at least one 
of a maximum extrema and a minimum extrema of said running partial deviation sum. 



20. A memory device carrying a set of instructions which, when executed, perform a 
method comprising: 

determining a total number of tasks present within said loop; 

calculating a local mean number of tasks for each of said plurality of processing 
elements; 

calculating a local deviation for each of said plurality of processing elements; 

determining a running partial deviation sum for each of said plurality of 
processing elements; 

determining a clockwise transfer parameter and an anti-clockwise transfer 
parameter for each of said plurality of processing elements; and 

redistributing tasks among said plurality of processing elements in response to 
said clockwise transfer parameter and said anti-clockwise parameter for each of said 
plurality of processing elements. 
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